[1] 4
[1] 2
[1] 12
[1] 2
[1] 4
1st Year Graduate Student in Quantitative Psychology
MSc in Psychiatry at University of Sao Paulo, Brazil
Advised by Dr. Philippe Rast
Studying intraindividual variability
All analyses will use the R computing language
Assignments are released after Lab and are due before next lab session
The instructor/TA will post an answer key to the course website on the due date. For this reason, late homework will not be accepted
Use the homework template to write your answers and submit a pdf version on Canvas. Paste your code when required
Office hours on Thursday 3-5PM at Young Hall 266
Questions via email at mmcarmo@ucdavis.edu
The lab and homework materials heavily rely on the work of previous 103B TAs, Paprika Jiang and Simran Johal.
Source: YaRrr! The Pirate’s Guide to R
To run a line of code just use CTRL + ENTER (that’s COMMAND + ENTER if you’re on a Mac)
How would I ask R to divide 10 by 2?
The R Calculator follows the PEMDAS rule:
Parentheseis, exponents, multiplication, division, addition, subraction from left to right
<- or =
a
Bad
1a <- 3
!a <- 3
a! <- 3
Better
a1 <- 3
a_object <- 3
a.object <- 3
aObject <- 3
Character data types must be surrounded by quotation marks
[1] "student"
[1] "myFirstCharacter"
There are only two options TRUE and FALSE
They must be in all caps
TRUE to T and FALSE to F (not recommended)Vectors allow saving multiple pieces of information to an object
The individual values within a vector are called “elements”
To do this we can use the c() function
This function combines different pieces of information together
first_vector?We saved four pieces of information (four numbers) to the vector
We can check the number of elements in a vector with the lenght() function
A vector is a one-dimensional collection of information
All the elements must be the same type
To pull out one element from a vector that has multiple elements, we need to subset the vector
Use square brackets [] after the label name with the element number we would like to recover inside the brackets
myvector[1]
Functions are pre-written pieces of code that accomplish some task
Rather than writing out the code to do this task, we can call a function by its label and it will complete that task
Let’s say I wanted to calculate the mean (or average) of our numeric vector
What if first_vector had 100 numbers not 4?
Or what if you had 100 vectors and wanted to calculate the means of each one
mean() functionArguments are the information we give the function so it can carry out its task
A function can have multiple arguments
functionLabel(argument1, argument2, argument3)
For the mean() function, the first argument was the data or the number we wanted the mean of
round() will round whatever number you give in the first argumentWhat if I want it to round the number to the second decimal point?
We can add another argument
Each argument has a label
Sometimes we don’t use them for convenience, but it is helpful specially if you’re dealing with a function with many arguments
When you don’t use labels, the order really matters
To learn more about a pre-built function and its arguments, type ?function_name() in the console (e.g., ?round()) to find the help page of that function
Or you can press tab once your cursor is within the function parenthesis
Another useful function is class()
The class function will tell you what kind of data type
1- Vector
2- Matrix
3- Array
4- Data frame
5- List
Two-dimensional dataset (has columns and rows) of one data type
matrix(data, nrow, ncol, byrow)
byrow = FALSE ?dim() tells you the dimension of a matrix
First element is number of rows, second element is number of columns
cbind() function to make each vector their own column in a matrixrbind() function to make each vector their own row in a matrixAllow you to have multiple data types
We can use the data.frame() function to create a data frame
Each argument is a different column
We can also add labels to each column
We’ve used square brackets [] to subset vectors
We can use square brackets [] to subset two-dimensional objects like a matrix or data frame
twoDimObject[row#, column#]
$ operator: twoDimObject$ColumnName
These are the 3 most common measures of central tendency
They are used to describe a distribution of observations (e.g., all the grades on an exam) in one number that best represents that distribution
Suppose we asked a bunch of UC Davis students how many hours per week they spent watching Netflix, and how many hours they spent exercising during Winter break:
But is the mean a good representation of these data?
Take a look again at the values and see if you find something odd
One person is watching Netflix 40h a week
Another exercised 45h per week
When we have outliers, sometimes the median is a better representation of the data
Remember, the median is the middle value of your data, after you have ordered it
Sometimes, we can’t do arithmetic on the data we have
If we had asked our 15 participants what their favorite flavor of ice cream was, we would not be able to describe that distribution using a mean or a median. We would have to use the mode
The mode is just the most frequent value
R doesn’t have a function for because it is not very frequently used, so we use the table() function
table() gives you the number of times each element shows up in an object
sort() function on the result of the table function to order itDo all the students exercise about the same? Or do some students exercise a lot while others don’t?
How are the observations spread out around the mean or median?
We’re gonna look at two different kinds that are related: Variance and standard deviation
var()
sd() function:In our imaginary example, each person gave us two bits of information, exercise and netflix hours
Let’s organize our data into a dataframe to better keep track of it:
If you are in RStudio, you can look at df by clicking on it, using View(), typing it in the console or using functions like head() and tail(). Can you tell what those do?
It’s also a good idea to plot your data
This helps you get a general idea for what your data looks like, and to see if there is anything weird going on
To make a scatterplot, we can use plot() function in R
We can make our plot look prettier changing the axis labels, and even giving our plot a title
What if we wanted to quantify this relation? We can use the covariance and correlation!
The covariance between two variables is a measure of how the two variables change together
It only makes sense if there’s some connection between the two variables
It resembles the variance, but instead of squared differences from the mean, we multiply these differences from the mean by each other
cov(), the use = "complete.obs" argument acts similarly to na.rm = TRUE: it will only use data from people who gave an answer to both Netflix and exerciseWe don’t know how strong is this association because covariances have arbitrary scales based on the scales of the original variables
We don’t know how big they could get so we don’t know if this value is large or small
Correlations can only range between -1 and 1, so they’re easier to interpret
We’ll standardize the covariance to get a correlation
Standardizing in this case means dividing by the variables’ standard deviations:
Since the correlation ranges between -1 and 1, we can say something about the strength of this relation
Based on some rules of thumb we can say there is a weak negative relation between watching Netflix and exercise
What is considered strong vs. weak can depend on the area of research you’re in
Next week: statistical test to see whether this correlation is significantly different from 0 or not!
PSC 103B - Statistical Analysis of Psychological Data